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Abstract 

Background: Hematopoiesis is a highly orchestrated developmental process that comprises various developmental 
stages of the hematopoietic stem cells (HSCs). During development, the decision to leave the self-renewing state 
and selection of a differentiation pathway is regulated by a number of transcription factors. Among them, genes 
GATA-1 and PU.l form a core negative feedback module to regulate the genetic switching between the cell fate 
choices of HSCs. Although extensive experimental studies have revealed the mechanisms to regulate the 
expression of these two genes, it is still unclear how this simple module regulates the genetic switching. 

Methods: In this work we proposed a mathematical model to study the mechanisms of the GATA-PU.1 gene network 
in the determination of HSC differentiation pathways. We incorporated the mechanisms of GATA switch into the 
module, and developed a mathematical model that comprises three genes GATA-1, GATA-2 and PU.1. In addition, a 
novel multiple-objective optimization method was designed to infer unknown parameters in the proposed model by 
realizing different experimental observations. A stochastic model was also designed to describe the critical function of 
noise, due to the small copy numbers of molecular species, in determining the differentiation pathways. 

Results: The proposed deterministic model has successfully realized three stable steady states representing the 
priming and different progenitor cells as well as genetic switching between the genetic states under various 
experimental conditions. Using different values of GATA-1 synthesis rate for the GATA-1 protein availability in the 
chromatin sites during the time period of GATA switch, stochastic simulations for the first time have realized 
different proportions of cells leading to different developmental pathways under various experimental conditions. 

Conclusions: Mathematical models provide testable predictions regarding the mechanisms and conditions for 
realizing different differentiation pathways of hematopoietic stem cells. This work represents the first attempt at 
using a discrete stochastic model to realize the decision of HSC differentiation pathways showing a multimodal 
distribution. 




Background 

Hematopoiesis is a highly orchestrated developmental pro- 
cess that comprises the proliferation, differentiation and 
maturation of a very small population of self-renewing, 
pluripotent hematopoietic stem cells (HSC) for producing 
different types of blood cells, including erythrocyte, mega- 
karyocyte, granulocyte, and macrophage [1,2]. During 
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development, the decision to leave the self-renewing state 
and selection of a differentiation pathway are regulated by 
transcription factors (TFs) [3-6]. Intense experimental stu- 
dies during the past two decades have suggested that tight 
regulation of HSC differentiation is controlled by the 
interaction of a number of genetic and epigenetic regula- 
tors of gene transcription, including the two TFs PU.l and 
GATA-1. Although the precise mechanisms to initiate the 
transcriptional cascade leading to different differentiated 
cells are not clear currently, experimental studies have 
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established that both PU.l and GATA-1 'autoregulate' 
themselves, i.e. they stimulate their own production, as 
well as they are mutually antagonistic, i.e. they repress the 
production of each other [7-9]. In the erythrocyte/mega- 
karyocite lineage high expression levels of gene GATA-1 
and low levels of PU.l were detected [6,10]; conversely, in 
the granulocyte/macrophage lineage higher expression 
levels of PU.l and low levels of GATA-1 were measured 
[5,11]. However, the initial progenitor cells stay in the 
third state that has low-level activation of both genes PU.l 
and GATA-1. When a progenitor cell differentiates, it 
transitions from the initial indeterminate state into one of 
two differentiated states with high expression levels of 
either gene GATA-1 or PU.l. Thus the full understanding 
of the PU.l and GATA-1 interaction is important for 
studying of the differentiation process of HSCs, which may 
be very useful in the clinical application of "differentiation 
therapy" for re-establishing the correct expression of PU.l 
and/or GATA-1 within immature leukemic cells [12]. 

To accurately describe the regulatory mechanisms 
controlling HSC differentiation, the important aspects 
that any mathematical model of the GATA-1 -PU.l net- 
work must include are an indeterminate state for the 
progenitor cells and two stable attractors of the dynami- 
cal system for the differentiated lineages. Since the first 
modeling attempt to study the regulation in the GATA- 
1-PU.l network [13], a number of mathematical formal- 
isms have been developed to realize the three stable 
steady states in HSCs. For example, Huang et al. used 
the Hill function with high co-operativity coefficients to 
qualitatively compare computer predictions with experi- 
mental evidence [14], giving support to the idea that 
lineage choice occurs as a two stage process, first prim- 
ing and then differentiating. To remove the requirement 
of high co-operativity, Chickarmane et al. assumed that 
the autoregulation at both PU.l and GATA-1 occurs 
through the binding of monomers [15]. It was predicted 
that an additional mechanism should be involved in the 
repressive interaction to create a bistable switch, and 
therefore a third unspecified gene was introduced to 
create bistability. Alternatively, a mathematical model 
was proposed to include the dynamics of the inactive 
heterodimer GATA-l-PU.l and the Michaelis-Menten 
function was used to represent the low co-operativity 
[16]. We have designed a model by separating the 
strength of co-operativity for autoregulation and repres- 
sion, and successfully realized a rich variety of systems 
behavior that have been found cross the existing models 
[17]. Recently Huang and collaborators proposed a gen- 
eral model that assumed no explicit interaction between 
the two genes; and realized a degenerated steady state 
via a new type of bifurcation [18]. Since the assumptions 
in these models are based on either unrealistic high co- 
operativity or unspecified gene, additional mechanisms 



are needed to accurately describe the differentiation 
decision of HSCs. 

GATA-2 is one of the six members of the hematopoietic 
GATA factor family and is most abundantly expressed in 
HSCs as well as in immature progenitors in hematopoietic 
lineages [10]. Previous studies demonstrated a critical role 
of GATA-2 in the emergence and maintenance of HSCs 
[19]. In addition, strict regulation of genes GATA-1 and 
GATA-2 is critical for proper lineage commitment and 
development of erythroid cells. It was reported that 
GATA-2 directly activated GATA-1 expression in early 
erythroid progenitors, and then GATA-1 accelerated its 
expression after its own expression has been initiated [20]. 
The gene expression process involving GATA-1 -mediated 
displacement of GATA-2 from chromatin is termed a 
GATA switch [21-23]. This "GATA factor switch" sug- 
gests a model whereby GATA-2 and GATA-1 sequentially 
bind the same cw-elements with activating and repressive 
effects, respectively. During GATA factor switch, the 
GATA-1 expression will increase due to the reciprocal 
decrease of GATA-2, which leads progenitor cells to 
adopt an erythroid lineage. However, when GATA-1 is 
absent or its expression is delayed, the reduction of 
GATA-2 will increase the expression levels of PU.l and 
leads progenitor cells to adopt a myeloid lineage [24,25]. 
Therefore the dynamic expression patterns of GATA-1 
and GATA-2 may influence the erythroid-myeloid cell-fate 
selection by regulating the expression of gene PU.l [22]. 
Although accumulating experimental evidence has sug- 
gested the important role of gene GATA-2 in regulating 
the cell-fate selection, only a simple Boolean network 
model has been proposed recently to include the regula- 
tory function of gene GATA-2 [26]. The kinetic dynamics 
of GATA-2 and in particular the function of GATA switch 
has not been systematically studied so far. 

Recent experimental studies have demonstrated that 
gene expression is a stochastic process. Key species of 
molecules such as DNA and mRNA may have small copy 
numbers, and the change of their copy numbers may 
cause significant variations of the system dynamics 
[27-30]. In particular, it has been shown that a variety of 
lineage-restricted genes in HSCs were expressed at low 
levels [31]. A recent study directly demonstrated that sto- 
chastic oscillation expression of lineage-associated genes 
could drive cell-fate commitment [32]. Accumulating 
experimental evidence also suggested that stem cells are 
heterogeneous, with cells moving between two or more 
metastable states [33] . Recently a minimal model has been 
designed by combining cell-extrinsic and cell-intrinsic ele- 
ments of regulation to understand how both instructive 
and stochastic events could inform cell commitment deci- 
sion in hematopoiesis [34] . However, compared with the 
advances in developing various stochastic models to inves- 
tigate the key functions of noise in genetic and cellular 
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processes [35-37], the critical role of noise in determining 
the stem cell differentiation has not been well established. 
This work is aimed at developing the first stochastic 
model to explore the critical function of GATA switch 
and noise in determining the differentiation pathways of 
HSCs. 

Methods 

Mathematical model of GATA-PU.1 gene network 

In this work we propose a mathematical model for the 
GATA-PU.l regulatory network including genes GATA-1, 
GATA-2 and PU.l (Figure 1A). It was assumed that each 
of these three genes activates their own expression 
through positive autoregulation. In addition, TF GATA-2 
activates the expression of gene GATA-1 but inhibits the 
expression of gene PU.l, but the function of GATA-2 to 
regulate the expressions of genes GATA-1 and PU.l is 
moderate [24]. Thus the expression levels of three genes 
GATA-1, GATA-2 and PU.l may maintain at an inter- 
mediate state, which is compatible to the steady state of 
lineage priming. In addition, GATA-1 inhibits the expres- 
sion of GATA-2 and PU.l, whereas PU.l inhibits the 
expression of GATA-1 and GATA-2. Detailed assumptions 
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Figure 1 Diagram of the GATA-PU.1 regulatory network. (A) The 

genetic regulation of the GATA-PU.1 network. (B) (Normalized) 
expression levels of genes GATA-1 and GATA-2 during GATA switch. 
Time was in arbitrary unit. 



of regulatory mechanisms and the list of chemical reac- 
tions are given in Additional file 1. 

Based on the GATA switch model (Figure IB), 
GATA-2 localizes at the chromatin sites in early stage 
erythroblasts. When the expression levels of GATA-1 
increase as erythropoiesis progresses, GATA-1 displaces 
GATA-2 from chromatin sites and often (but not 
always) instigates a distant transcriptional output [21]. 
Although remaining in the cell, TF GATA-2 in fact is 
unable to access the chromatin sites. To model this spatial 
regulatory mechanism, it was assumed that GATA-2 pro- 
teins degrade during the process of GATA switch, which 
was realized by a large degradation rate constant k* 2 of 
GATA-2 in Eq. (1). Simultaneously, the concentration of 
GATA-1 increases by using an additional synthesis rate of 
GATA-1, which is proportional to the removal of GATA-2. 

Experiments also showed that, when GATA-1 was not 
present, the shRNA knockdown of GATA-2 increases 
the expression of gene PU.l and reprograms the cells to 
become macrophages [24]. To realize genetic switching, 
the mechanism of the GATA switch and that of PU.l 
knockdown were included in a single framework. Thus 
the knockdown of GATA-2 was realized by a large 
degradation rate constant k 2 of GATA-2 during a parti- 
cular time period. In addition, a very small synthesis 
rate of GATA-1 during that time period means the 
absence of GATA-1 protein in the DNA promoter 
region. Furthermore, for simplicity, the basal expression 
rates of these three genes are assumed to be zero. We 
use the Shea- Ackers formalism, which is a widely used 
thermodynamic approach, to represent the gene expres- 
sion based on the structure of transcription machinery 
[38]. All these assumptions led to the following model 
to realize the genetic switching of the GATA-PU.l regu- 
latory network, given by 



a\x + a 2 y 



dx 

dt as + a^x + asy + a^z + ajxz 

dy_ hy 

dt b 2 + bsx + b^y + bsz + b^yz 
dc C\z 



dt c 2 + C3X + C4V + C5Z + C(,xz + Cjyz 



k\x + n,k* 2 y 
hy - k* 2 y 
k 3 z 



(1) 



where x, y and z are the concentrations of TFs GATA-1, 
GATA-2 and PU.l, respectively, a lt b\ and c l represent 
the expression rates of genes GATA-1, GATA-2 and PU.l 
auto-regulated by itself, respectively, a 2 is the expression 
rate of gene GATA-1 regulated by TF GATA-2, k\, k 2 and 
k 3 are the degradation rates of TFs GATA-1, GATA-2 and 
PU.l, respectively, k* 2 is the degradation rate constant to 
represent the displacement of GATA-2 proteins, 



If; 



k 20 t e [h,t 2 ] 
0 else 



(2) 
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during the time period [f 1( t 2 ] , and ^ is a parameter 
to adjust the availability of GATA-1 proteins in the 
chromatin sites. The GATA switch model is realized by 
using k* 2 > 0, fi > 0 with a moderate value of parameter 
fi (e.g. \i = 1). In contrast, the knockdown of GATA-2 is 
realized by using k 2 > 0, fi = 0. Another realization of 
the GATA-2 knockdown is to use rate constants k* 2 = 0, 
/>i = 0 for the expression of gene GATA-2, namely to 
block the synthesis process of GATA-2 but maintain the 
degradation process of GATA-2 unchanged. Numerical 
results did not show any substantial difference between 
the computer simulations of these two types of realiza- 
tion (results not shown). Finally we note that the pro- 
posed model (Eq. 1) includes a recently designed model 
for the GATA-l-PU.l module [18] as a special case if 
gene GATA-2 is removed from the system. 

Model parameters estimated from experimental data 

There are 23 rate constants in the proposed mathemati- 
cal model (Eq. 1). Part of the parameters was deter- 
mined by the following experimental data. 

(1) The half-life of GATA-1 protein is one hour [39]. 
In addition, the half-life of GATA-2 is 30 min [40] 
that was confirmed by another observation in [33] 
therein. The half-life of PU.l in Mel cell is -2.4 h 
[41]. Thus the protein degradation rates constants 
are Aj = 0.6931///, l 2 = 1.3863///, A 3 = 0.2888///. 

(2) The disassociate rate of GATA-1 binding to its 
DNA promoter is = 2.8 nM, which is more stable 
than the binding of GATA-2 to its promoter site Kj = 
4.4 nM [42]. In addition, the disassociate rate of PU.l 
binding to its DNA promoter is K d = 170 nM [43]. 
These rates were used to determine coefficients « 4) £> 4) 
and c 5 . 

(3) In addition, the heterodimer GATA-l-PU.l has a 
3-fold increase of the binding rate constant over 
GATA-1 to DNA [7]. It was assumed that a 7 = 3a 4 . 

(4) The disassociation rates of GATA-1 and GATA-2 
binding to the DNA promoter sites are very close to 
each other [44]. It is assumed that the binding rates 
of GATA-1 and GATA-2 to the same DNA binding 
sites are the same, namely b 6 = a 7 , and c 7 = c 6 . 



Multiple-objective optimization approach 

To infer the unknown parameters in model (Eq. 1), we 
designed a novel multiple-objective optimization approach 
to estimate unknown model parameters (Figure 2). The 
criteria in this approach include the conditions of tristabil- 
ity, genetic switching and robustness property of the 
model. This approach includes the following major steps: 

Step 1. Generate a set of model parameters in the 
GA. We first used the uniform distributed random 
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Figure 2 Schematic representation of the multiple-objective 
optimization algorithm for estimating model parameters. This 
algorithm includes five major steps, namely to generate initial sets 
of model parameters from a genetic algorithm (GA) in Step 1; to 
find the three steady states of the model with a particular set of 
parameters in Step 2; to validate the existence of the three steady 
states in Step 3; to examine the existence of genetic switching in 
Step 4; and to conduct robustness analysis of the model based on 
the property of genetic switching in Step 5. Finally the penalty 
function value (PFV) is sent back to the GA for the selection of the 
optimal set of parameters. 



variable £7(0,1) generated in the genetic algorithm (GA) 
to determine the initial model rate constants. Note that 
the mutation manipulation in the GA was carried out 
on the random samples rather than model parameters. 
Since there are 7 restricted conditions for the existence 
of stable steady states (Eq. 6-8), other parameters were 
directly determined by the random samples a, = r it 
where r, - £7(0,1). Seven parameters were determined by 
the random samples together with the restricted condi- 
tions. For example, according to condition (Eq. 7), para- 
meter b 3 is determined by 



h = 



1 



k 2 



where r w - £7 (0,1) and x\ is the steady state (Eq. 4). 
In addition, based on condition (Eq. 6), the four synth- 
esis rates were also determined by a factor k, 



ha 3 k 2 b 2 , 
x fe, a 2 = r 2 n, bi = x fe, 



Cl 



hc 2 



x fe, 



where r, - £7(0,1). We tested different values of k in 
order to realize genetic switching. When k is not large 
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(1 < k < 100), we failed to find a set of parameters that 
could realize genetic switching. Thus in this work we 
used k = 1000 to search the unknown synthesis rate. 

Step 2. Find the third steady state. The generated 
parameter set in Step 1 ensures the existence of the two 
stable steady states in which either GATA-1 or PU.l has 
high expression levels. To find the third stable steady 
state representing the primed progenitor state, we used 
MATLAB function fsolve.m to solving the nonlinear sys- 
tem for the steady state of the model (Eq. 1). If we can 
find the third stable steady state, we go to Step 3. 
Otherwise, we set the penalty function value to 4 and 
go to Step 6. 

Step 3. Validate the existence of three steady states. 

Since the third steady state found in Step 2 was 
obtained by a numerical method, it may not exist due to 
the computational error. To examine the existence of 
the three steady states, we perturbed each steady state 
using samples of the uniformly distributed random vari- 
able and used the perturbed steady state as the initial 
condition to simulate system (Eq. 1). If the simulation 
converges to the steady state, it means the steady state 
exists, and then we go to Step 4. Otherwise, we set the 
penalty function value to 3 and go to Step 6. 

Step 4. Validate the existence of genetic switching. 
We next examined whether the system can realize 
genetic switching using the mechanisms of GATA 
switch and GATA-2 knockdown. The large degradation 
rate constant of GATA-2 in model (Eq. 1) for these two 
mechanisms is 

[ 20, 500 < t < 1500 

k* 2 = 

0, else. 

Simultaneously, the additional synthesis rate constant 
of GATA-1 in model (Eq. 1) is 

1, GATA switch 

11 ~ 0, GATA - 2 knockdown. 

If the system realizes genetic switching through these 
two mechanisms, go to Step 5. Otherwise, set the pen- 
alty function value to 2 and go to Step 6. 

Step 5. Robustness analysis. The inferred parameter set 
now satisfies the requirement for realizing three stable 
steady states and genetic switching. Next we used robust- 
ness property of the system as an additional criterion to 
choose the optimal model parameters. We tested the 
robustness property of mathematical model using the per- 
turbed model parameters [45,46]. Each parameter in the 
model was perturbed by a uniformly distributed random 
number 



where ~ £7(0,1) and a = 0.5 is the perturbation 
strength. For each set of model parameters, we obtained 
1000 sets of perturbed parameters and then examined 
whether the mathematical model with the perturbed 
model parameters still maintained the three steady 
states. The model with a particular set of model para- 
meters is more stable if the model maintains the three 
steady states with more sets of perturbed model para- 
meters. To make an unbiased comparison, we used the 
same random samples U jk for different sets of para- 
meters. The penalty function value at this step is the 
percentage of the parameter sets (from the 1000 sets of 
perturbed parameters) with which the model does not 
maintain the three steady states. Thus a smaller value of 
penalty function means better robustness property of 
the model. 

Step 6. Return the value of penalty function to the 
genetic algorithm. 

Note that there are four penalty functions in this pro- 
posed inference method, namely the existence of three 
steady states in Step 2 (denoted as event Oi), validation 
of these three steady states in Step 3 (denoted as 0 2 ), 
validation of the existence of genetic switching in Step 4 
(denoted as O3), and robustness property of the model 
over 1000 perturbed sets of model parameters in Step 5 
(denoted as 0 4 ). Thus, if a numerical test in Steps 2-4 is 
not satisfied, the penalty score of that step should be lar- 
ger than the maximal score of the next step. Since the 
maximal penalty score of Step 5 is set to unit one if none 
of the perturbed parameter set maintains genetic switch- 
ing, the scores of the first three penalty functions are set 
to 4, 3 and 2, respectively. Using the notations of objec- 
tive functions [47], the multi-objective function is repre- 
sented by 

min{4P(O0 + 3P(O 2 |P(O0 = 0) + 2P(0 3 \P(0 2 ) = 0) + P(O l \P(0 3 ) = 0)}. 

Here the probability of objective Oi is defined by, for 
example, 

0 existing three steady states 

1 otherwise 

Similar definitions are applied to the probabilities of 
objectives 0 2 and 0 3 . 

Results 

Steady states of the mathematical model 

When k* 2 = 0, the proposed mathematical model (Eq. 1) 
has up to four steady states. When kia 4 * 0 and k 3 c 5 * 
0, three steady states can be obtained analytically, given 

by 



Ojk = <Jj(l + cr*(Uj k - 0.5)), ) = !,..., 23, k = 1, . . . , 1000, 



{xo.yo.zo) = (0,0,0) 



(3) 
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(*i,»,zi)-(^4^0 f 0) (4) 

(* 2 ,y 2 ,z 2 ) = (0,0,^-^% (5) 

K3C5 

The existence conditions of these three steady states 
are given in Theorem 1. 

Theorem 1 (1) The trivial steady state (Eq. 3) is 
unstable if any one of the following conditions is satis- 
fied 

a\ > a^ki, bi > b 2 k 2 , and ci > c 2 fc3. (6) 

(2) The steady state with high expression levels of 
gene GATA-1 (Eq. 4) is stable if the following conditions 
are satisfied 

bi < k 2 (b 2 + ^3X1), and c\ < ks(c 2 + c$xi). (7) 

(3) The steady state with high expression levels of 
gene PU.l (Eq. 5) is stable if the following conditions 
are satisfied 

fli < fei(fl3 + a&z 2 ), and b\ < k 2 (b 2 + bsz 2 ). (8) 

The proof of this theorem is provided in the Addi- 
tional file 1. 

However, it is difficult to derive an analytical expres- 
sion of the fourth potential steady state. For a given set 
of model parameters, we have to examine the existence 
of the fourth steady state numerically on a case-by-case 
basis. The existence conditions of the stable steady 
states will be used as criteria to search the unknown 
model parameters. 

Inference of model parameters 

The proposed model (Eq. 1) has 20 unknown parameters 
by setting « 3 = b 2 = c 2 = 1. We first estimated the values 
of 9 model parameters, namely (a 4 , a 7 , b 4 , b 6 , c 5 , k lt k 2 , k 3 ) 
and (c 6 = c 7 ) from the published experimental data dis- 
cussed in the model section. Since there is not any pub- 
lished data for the temporal dynamics of gene expression, 
we used the designed multiple-objective optimization 
approach to estimate the remaining 11 unknown para- 
meters (Figure 2). We used the genetic algorithm as an 
effective tool to search the optimal model parameters to 
realize genetic switching. A MATLAB toolbox developed 
by Chipperfield et al. [48] was used to infer the unknown 
model parameters. This toolbox used MATLAB functions 
to build a set of versatile routines for implementing a wide 
range of genetic algorithms. The initial estimate of rate 
constants can be changed by using different random seeds 
in the MATLAB toolbox, leading to different final esti- 
mates of the rate constants [49]. The genetic algorithm 
was run over 100 generations for each estimate, and we 



used a population of 1000 individuals in each generation. 
Our tests showed that a smaller population of individuals 
(i.e. 100 or 200) failed to produce an estimate of model 
parameters with which the model has tristability property. 
We implemented the genetic algorithm with different 
initial kinetic rates and obtained a number of estimated 
parameter sets. The parameter set having the best robust- 
ness property was selected as the final estimate. 

Figure 3 gives simulations of the deterministic model 
(Eq. 1) using the degradation rate constant of GATA-2 
(Eq. 2) and different values of the GATA-1 synthesis 
rate [i. The concentrations of GATA-2 decreased sub- 
stantially during t G [500, 1500] when a large degrada- 
tion rate k 2 = 6 was applied during that time period. 
Figures 3A, 3B and 3C give a simulation using the 
GAT A switch module that was realized by ft = 1. When 
the concentration of GATA-2 decreases during t G [500, 
1500] in Figure 3B, GATA-1 reaches the high expression 
level quickly in Figure 3 A. However, when GATA-1 is 
absent = 0) and GATA-2 is knockdown, Figures 3D, 
3E and 3F show a simulation leading to the high expres- 
sion level of PU.l. The third scenario is that the expres- 
sion of GATA-1 maintains at a low level during the 
period of GATA-2 decreases when fi = 0.32. In this case 
the expression levels of these three genes in Figures 3G, 
3H and 31 maintain at an intermediate state that is 
dependent on the value of p. Neither GATA-1 nor PU.2 
reached the steady state of high expression levels. When 
the expression level of GATA-2 returned to the normal 
state at t = 1500, the system also returned to the primed 
steady state, which is an unsuccessful genetic switching. 

When inferring the model parameters, we only tested 
the robustness property of the model with perturbation 
strength a = 0.5 (see Step 5 in the Optimization Approach 
section). The next question is whether different values of 
the perturbation strength may lead to different selections 
of the estimated model parameters. To answer this ques- 
tion, we selected 10 sets of the estimated model para- 
meters that have better robustness properties than the 
others, and examined the robustness property of the 
model using different perturbation strengths ranging from 
0.1 to 1. Figure 4 shows that, the robustness property of 
the model with a smaller strength (<7 = 0.3) is slightly dif- 
ferent from that using the perturbation strength (a = 0.5) 
because only a small numbers of perturbed parameters 
sets did not maintain the tristability property. However, 
when a > 0.4, the robustness property based on different 
values of perturbation strength a is consistent with that 
using (o~ = 0.5). Figure 4 also presents the averaged robust- 
ness properties based on 10 values of the perturbation 
strength (a = 0.1, 0.2, 1), which is well consistent with 
that using a = 0.5. Thus it is suggested that the inferred 
model parameter is independent of the perturbation 
strength. 
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Bifurcation analysis 

An important question in the estimation of model para- 
meters is to what extent the model with the varied para- 
meter still maintains the three stable steady states. To 
answer this question, we examined the tristability of the 
proposed model (Eq. 23) under the variations of one 
model parameter. For each rate constant, the perturbed 
values range from 0 to the 2-fold of the estimated value. 
We first tested the influence of the four synthesis rates a\, 
a-i, b\ and C\ since these parameters are important in regu- 
lating the expression of these three genes. Figure 5 shows 
that the tristability property is not sensitive to the change 
of synthesis rate a 2 , which suggested that the expression of 
GATA-1 is not sensitive to the regulation of GATA-2. 
The result is consistent with the experimental observation 



showing that this regulation is relatively weaker than other 
genetic regulations in the system. Interestingly, the system 
maintains tristability if the synthesis rate of GATA-1 (a{) 
or PU.l (ci) increased, or the synthesis rate of GATA-2 
(&i) decreased. However, the tristability property of the 
system disappears when these three rate constants vary 
along the opposite direction, in particular for the synthesis 
rates of GATA-2 and PU.l. The discontinuity of the 
curves in Figure 5C suggested that the tristability property 
of the system disappears when the value of b\ is below a 
threshold value. 

Stochastic dynamics 

We have successfully realized the three steady states of 
the HSCs and genetic switching using the genetic 
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regulations between GATA genes and gene PU.l. How- 
ever, the proposed deterministic model failed to describe 
the heterogeneity in the differentiation decision of the 
HSCs. To tackle the challenge, we designed a stochastic 
model to realize the function of intrinsic noise arising 
from key species with small copy numbers, such as 
DNA and mRNA. Using our proposed stochastic model- 
ling approach [50], we designed the following stochastic 
model based on the developed deterministic model (Eq. 1), 
given by 

Zlt + r) =Ztt)+P\ — rl -PffeZrl 

v ' u lc 2 + c 3 X + aY + c 5 Z + c 6 XZ + c 7 YZ ] L 1 

where X, Y, and Z are the copy numbers of protein 
GATA-1, GATA-2 and PU.l respectively, r is the step- 
size, P(X) is a Poisson random variable with mean A. 
Protein concentrations in model Eq. (1) were transferred 
into the molecular copy numbers in this stochastic 
model. To reduce the computing time of stochastic 
simulations, the mechanisms of genetic switching were 
realized in the time period 



k* = 



25 50 < t < 200 
0 else 



(10) 



Figure 6 gives three stochastic simulations using the 
same rate constants in Figure 4 and fi = 0.28. At time 
point t = 50, the expression level of GATA-2 decreased 
quickly to the basal level due to the large degradation 
rate k* 2 . Although the expression levels of genes GATA-1 
and PU.l increased quickly from time t = 50, the high 



expression level of one gene inhibited the expression of 
the other gene. Figures 6A, 6B and 6C show that the 
high expression levels of GATA-1 inhibited the expres- 
sion of gene PU.l and GATA-2, leading the cell into the 
erythrocyte lineage; whereas the expression levels of 
GATA-1 failed to increase further in Figures 6D and its 
expression was inhibited by the growing expression 
levels of PU.l in Figure 6F, leading the cell into the 
granulocyte lineage. In addition, it was possible that the 
expression levels of neither gene GATA-1 nor PU.l 
were high enough to inhibit the expression of the other 
gene. Figures 6G, 6H and 61 gave a simulation of unsuc- 
cessful switching in which the expression levels of these 
three genes maintained at an intermediate state that was 
determined by the value of fi. When the degradation 
rate of GATA-2 returned to the normal value at t = 
300, the expression levels of these three genes returned 
to the original priming state. 

To obtain the statistical property of genetic switching, 
we generated 1000 stochastic simulations for each value 
of fi over the range of [0, 1.5]. When the value is small 
(fi < 0.11), which mimicked the experimental condition 
of gene GATA-2 knock-down, Figure 7 shows that all 
simulations have high expression levels of gene PU.l. 
On the contrary, when the value of fi is large {fi > 1.4), 
all simulations were switched to the state with high 
expression levels of gene GATA-1, which represents the 
mechanisms of GATA switch that lead HSCs to the ery- 
throcyte lineage. In addition, there is a range of values 
(0.12 <fi < 0.22) with which the system failed to realize 
genetic switching and all stochastic simulations stay at 
the priming state. When the value of fi locates between 
these windows, stochastic simulations can realize two or 
three different steady states. For example, when the 
value of fi is 0.11 < fi < 0.12, the system state is in either 
the priming state or granulocyte lineage. An important 
prediction in Figure 7 is that, when the value of fi is 
(0.22 <fi < 0.37), stochastic simulation of the system 
may lead to anyone of the three steady states, namely 
the erythrocyte, priming and granulocyte lineages, such 
as the case demonstrated in Figure 6. 

Discussion 

In this work we proposed a mathematical model to study 
the mechanisms of the GATA-PU.l gene network in the 
determination of HSC differentiation pathways. The 
novelty of this model is the inclusion of gene GATA-2 and 
the GATA switch model based on the experimentally 
determined regulatory mechanisms. Our simulation 
results suggested that, based on the experimental deter- 
mined regulatory mechanisms, the addition of the third 
gene, namely gene GATA-2, is necessary and adequate to 
realize the three stable steady states of the HSCs. This 
result is consistent with the prediction that a connector 
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gene X is required to realize the primed state [15]. 
Although the third gene negatively regulates the expres- 
sion of PU.1 in these two models, the regulatory mechan- 
isms of the third gene in these two models are not 
completely the same. For example, it was assumed that 
gene GATA-1 positively regulate the expression of the 
unknown gene X [15], rather than the negative regulation 
of gene GATA-2 by gene GATA-1 in this work. Compared 
with the model in [15] in which additional and unknown 
external signals are required to maintain the tristability 
property, regulations between these three genes in our 
proposed model are adequate to maintain the tristability 
property of the system, which represents a successful 
approach in utilizing experimentally confirmed regulatory 



mechanisms to realize tristability property of the HSCs in 
regulating the erythroid-myeloid lineage decision. 

The proposed model in this work for a network of 
three genes is a general framework that includes a 
recently published model for a network of two genes as a 
special case [18]. Unlike the model of two genes, which 
realized tristability using the bifurcation of the system by 
increasing the ratio of two model parameters, our model 
maintains the tristability property over a wide range of 
model parameter values. A related interesting question is 
the minimal motif to realize the stability property of a 
regulatory network with different numbers of steady 
states. It has been demonstrated that the two-gene mod- 
ule with self-activation and mutual repression can realize 
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stability property with two steady states [50-53]. 
Although attempts have been made to realize tristability 
property of this two-gene module using various assump- 
tions of regulatory mechanisms [13,14,17], our research 
demonstrated that a regulatory module with three genes 
could maintain stability property with three steady states 
without the assumption of the autoregulation via high 
order multimers. A similar result is that a three-compo- 
nent motif with four links can realize tristability [54]. 
However, the GATA-PU.l module has six links together 
with more complex regulatory mechanisms. It would be 
interesting to analyze theoretically the stability property 
of a regulatory module with the maximal number of 
steady states under various regulatory mechanisms. 

Stochastic simulations in this work predicted that the 
synthesis rate of GATA-1 during the decreasing process of 
GATA-2 determines the probability of erythroid-myeloid 



lineage decision. Since this synthesis rate represents the 
availability of GATA-1 in the genomic regulatory regions 
during the GATA switch, there are two potential 
resources of noise to explain the variations of GATA-1 
proteins in the genomic regions. First, recent experimental 
research investigating cellular processes at single cells has 
revealed convincing evidence showing large heterogeneity 
in protein abundance and dynamics among genetically 
identical cells [55]. The variations of protein concentration 
in different cells may lead to different rates for GATA-1 to 
enter the regulatory regions during the process of GATA 
switch. The heterogeneity in protein abundance may be 
one of the reasons to explain the differentiation preference 
of HSCs for the erythroid lineage or the myeloid lineage. 
In addition, the intrinsic noise due to the small copy num- 
bers of molecular species in the GATA-PU.l module may 
further contribute the heterogeneity to the HSC lineage 
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differentiation decision even for cells having the same line- 
age decision preference. These various resources of noise 
in gene expression may be part of the transcriptome-wide 
noise proposed by Huang and colleagues [32] . Therefore 
more comprehensive stochastic models are needed to 
explain the functions of other types of noise in the deci- 
sion of lineage selection. 

Conclusions 

In summary, we proposed a mathematical model to study 
the mechanisms of the GATA-PU.l gene network in the 
determination of HSC differentiation pathways. In addi- 
tion, a multiple-objective optimization approach was 
developed to infer model parameters in order to realize 
the three stable steady states representing the three differ- 
ent types of blood cells and genetic switching. A stochastic 
model was also designed to describe the function of noise 
in determining the differentiation pathways. Stochastic 
simulations successfully realized different proportions of 
cells leading to different developmental pathways under 
the same experimental conditions, and provided testable 
predictions regarding the conditions and mechanisms to 
realize different differentiation pathways. This work repre- 
sents the first attempt at using a discrete stochastic model 
to realize the decision of HSC differentiation pathways 
with a multimodal distribution. 
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